A comparison of freezer‐stored DNA and herbarium tissue samples for chloroplast assembly and genome skimming

Abstract Premise The use of DNA from herbarium specimens is an increasingly important source for evolutionary studies in plant biology, particularly in cases where species are rare or difficult to obtain. Here we compare the utility of DNA from herbarium tissues to their freezer‐stored DNA counterparts via the Hawaiian Plant DNA Library. Methods Plants collected for the Hawaiian Plant DNA Library were simultaneously accessioned as herbarium specimens at the time of collection, from 1994–2019. Paired samples were sequenced using short‐read sequencing and assessed for chloroplast assembly and nuclear gene recovery. Results Herbarium specimen–derived DNA was statistically more fragmented than freezer‐stored DNA derived from fresh tissue, leading to poorer chloroplast assembly and overall lower coverage. The number of nuclear targets recovered varied mostly by total sequencing reads per library and age of specimen, but not by storage method (herbarium or long‐term freezer). Although there was evidence of DNA damage in the samples, there was no evidence that it was related to the length of time in storage, whether frozen or as herbarium specimens. Discussion DNA extracted from herbarium tissues will continue to be invaluable, despite being highly fragmented and degraded. Rare floras would benefit from both traditional herbarium storage methods and extracted DNA freezer banks.

Biological collections are an essential part of biodiversity research (Lavoie, 2013). By preserving and making publicly available a variety of biological specimens, the broader community can accelerate progress on research and educational efforts. Herbaria are the main site of botanical collections and are often broadly accessible via digitization and databasing efforts (Soltis, 2017). Many botanical questions have been addressed by studying herbarium collections, including phenology (Lavoie and Lachance, 2006;Willis et al., 2017;Jones and Daehler, 2018), species distributions (Loiselle et al., 2007;Feeley, 2012), and trait diversity (Heberling, 2022). In addition to studies that leverage the morphological and location information provided by a herbarium specimen, there has been a growing appreciation of how herbarium tissue can be used in genomics studies to further our understanding of the evolutionary relationships and patterns found in these collections.
Because herbarium specimens by definition include their associated collection dates and locations, they can be used to document changes in genetic diversity, especially in species that are rare and/or endangered (Cozzolino et al., 2007;Rosche et al., 2022). In Anacamptis palustris (Jacq.) R. M. Bateman, Pridgeon & M. W. Chase, herbarium specimens were used to compare historical and present-day patterns of genetic diversity (Cozzolino et al., 2007). Large phylogenomic studies now routinely incorporate DNA obtained from herbarium specimens (e.g., Zeng et al., 2018;Nevill et al., 2020;Jiang et al., 2022). In addition, herbarium specimens themselves can be sources of genetic diversity. For example, rare and endangered plant species can be grown from seeds derived from the fruit on herbarium sheets (Wolkis et al., 2022). While seed viability may be variable and species-specific, it does still represent a potential conservation strategy (Godefroid et al., 2011).
Herbarium specimens are especially crucial in the study of rare, endemic floras that are often understudied and lack genomic resources. For example, of the 1400 plant species naturalized in the Hawaiian archipelago, 90% are endemic and nearly 400 are threatened or endangered (Department of Land and Natural Resources, 2023;Pacific Islands Fish and Wildlife Office, 2023). The development of genomic resources in diverse Hawaiian taxa would improve our understanding of adaptation, biogeography, and conservation in this unique flora. For example, the endemic ʻōhiʻa lehua (Metrosideros polymorpha Gaudich.) is a culturally and ecologically important species, holding a central role in both Hawaiian moʻolelo (legend) and in forests across the islands. By sequencing whole genomes of 131 individuals across 11 species in the genus, including ʻōhiʻa lehua, Choi et al. (2021) were able to draw inferences on the demographic history of the genus and the genomic extent of divergent selection in this adaptive radiation. In general, however, studies on the genomics of Hawaiian endemics are limited (but see Bellinger et al., 2022). Despite seemingly endless scientific interest in the radiation of species found in Hawaiʻi, endemic species in Hawaiʻi are often difficult to study due to their small population sizes and remote habitats. Herbaria are well suited for genomic investigations of species for which wild tissue collections are impractical from the perspective of cost, time, and/or safety. The ability to obtain sufficient DNA from a herbarium specimen can quickly jumpstart research on rare species.
Obtaining high-quality DNA from herbarium specimens for downstream applications can be challenging (Savolainen et al., 1995;Staats et al., 2011;Särkinen et al., 2012), but has shown promise via sampling genomes using reduced representation approaches such as double-digest restriction site-associated DNA sequencing (ddRADseq) in very old herbarium specimens with methods tested across four genera in Jordon-Thaden et al. (2020). Hybrid capture protocols have also successfully produced large genomic data sets from relatively old herbarium samples (Hart et al., 2016). Additionally, Kates et al. (2021) showed that specimen age and extent of DNA degradation did not correlate with sequencing success, demonstrating that herbarium-stored specimens of varying age can be used in phylogenomic studies. Despite considerable recent innovations in the area of herbarium DNA extractions, the effects of storage time, species, and their interaction can all limit the type of molecular work that is possible for a particular collection. To counteract the negative effects of time on a herbarium sheet, DNA can be extracted from the fresh tissue of the plant that is being accessioned (or a relative) and vouchered in cold storage while the plant tissue remains in the herbarium. This is the approach of the Hawaiian Plant DNA Library (HPDL), established in 1992 (Morden et al., 1996). Approximately 13,000 specimens have had their DNA extracted and stored at −20°C, representing approximately 89.7% of flowering plant genera and 92.7% of families that are found in Hawaiʻi. The collection also represents 49.3% of the genera and 70.4% of the families comprising ferns and allies found in Hawaiʻi. These DNA and corresponding herbarium specimens are publicly available upon request and have been used in a variety of research projects, including studies on the phylogenetics of native species (Howarth et al., 1997(Howarth et al., , 2003Iwanycki Ahlstrand et al., 2019), leaf fungal diversity (Datlof et al., 2017), and diversification patterns of endemic species across the island archipelago (Hobbs and Baldwin, 2013;Yang et al., 2018).
Here, we set out to understand the effects of age and storage method (herbarium vs. DNA banked in a freezer) on DNA quality for paired samples in the HPDL and the Joseph F. Rock Herbarium at the University of Hawaiʻi (Honolulu, Hawaiʻi, USA). In particular, we focused on (1) sequencing taxa that are endemic to Hawaiʻi to generate new genomic resources from low-coverage genome skimming and (2) comparing library quality and sequencing results from specimens that, when collected, were simultaneously accessioned in the HPDL and the herbarium. By using these paired herbarium tissue and frozen DNA specimens, we remove any genotype-or collection-specific anomalies and can directly infer effects of storage method and age. We find that while both storage methods provide DNA of a quality suitable for high-throughput sequencing, freezer-stored DNA samples are less degraded over time.

Hawaiian Plant DNA Library sampling
Hawaiian Plant DNA Library samples in this study were collected between 1994 and 2019. Upon collection, samples were accessioned in the Joseph F. Rock Herbarium (Appendix 1). Fresh tissue for DNA extraction was field collected into a plastic bag and remained at 4°C until it was extracted via a modified cetyltrimethylammonium bromide (CTAB) protocol and cesium banded (Morden et al., 1996;Randell and Morden, 1999). Purified DNA was stored at −20°C until this analysis. Samples in this analysis were chosen to represent native Hawaiian species, to span a breadth of duration in cold storage, and when possible, to be paired with herbarium sheet tissue based on the original plant specimen that was deposited in the HPDL freezer collection.

DNA extraction and library construction
Small amounts of herbarium specimen tissue were sampled (Appendix S1; see Supporting Information with this article) as input for a modified CTAB DNA extraction protocol (Doyle, 1987). Briefly, this tissue was ground with mortar and pestle in liquid nitrogen and incubated at room temperature in an extraction buffer containing 0.35 M sorbitol and 10 mM βmercaptoethanol to remove phenolic compounds from the plant tissue. The supernatant was removed and replaced with 300 μL of extraction buffer and 400 μL of CTAB lysis buffer, and the samples were then incubated for 1 h at 65°C. After chloroform and isoamyl alcohol phase separation, samples were chilled at −20°C overnight. All DNA samples (herbarium and HPDL) were quantified via the Qubit dsDNA Broad Range Assay Kit (Invitrogen, Waltham, Massachusetts, USA). Before library construction, all samples (both herbarium and HPDL) were 3X bead-cleaned using homemade magnetic beads prepared from Sera-Mag SpeedBeads (Thermo Fisher Scientific, Waltham, Massachusetts, USA) by the Microbial Genomics and Analytical Laboratory Core at the University of Hawaiʻi at Mānoa. Libraries were quantified again after bead-cleaning via the Qubit dsDNA Broad Range Assay Kit. Using the KAPA HyperPlus library kit (Roche, Indianapolis, Indiana, USA), samples were enzymatically fragmented for 4 min before end repair, A-tailing, and ligation per the manufacturer's protocol. The adapters used in the ligation reaction are Y-stub adapters purchased from BadDNA lab at the University of Georgia (Athens, Georgia, USA) (Glenn et al., 2019). PCR was performed for eight cycles using dual-indexed barcoded primers from the BadDNA lab. Samples were 0.8X bead-cleaned prior to PCR to remove adapter dimers, and the PCR products were 1.0X bead-cleaned and eluted into 10 mM Tris HCl for storage. Completed libraries were initially assessed via the Qubit dsDNA High Sensitivity Assay Kit, and then via qPCR using the KAPA Library Quantitation kit. All libraries were diluted to approximately 3 nM and then pooled using equal volumes before being submitted to Novogene (Sacramento, California, USA) for sequencing on a NovaSeq 6000 (Illumina, San Diego, California, USA) with 150-bp paired-end reads.

Sequence analysis
All sequence data were first checked for base quality and the presence of short fragments using FastQC (Andrews, 2010). Trimmomatic v0.39 was used to quality trim the ends of reads and remove the presence of Illumina sequence adapters (Bolger et al., 2014). Paired reads that were retained post-trimming were used as input for chloroplast genome assembly via the assembler GetOrganelle (Jin et al., 2020). Assembly read depth and average insert size were obtained from GetOrganelle output files, and assembly length and GC content were determined using SeqKit (Shen et al., 2016). For final plastid assemblies that consisted of multiple scaffolds, a weighted average was used to characterize GC content. Chloroplast assemblies were annotated for protein-coding genes and ribosomal RNAs using GeSeq (Tillich et al., 2017). Raw sequencing reads and assemblies have been deposited in GenBank (Appendix 2).
In addition to analyzing the organellar sequences, we were interested in understanding the degree to which nuclear gene targets could be recovered. We used HybPiper (Johnson et al., 2016) to map and assemble reads corresponding to the Angiosperms353 single-copy nuclear gene bait set (Johnson et al., 2019), or in the case of Cibotium menziesii Hook., from the GoFlag 451 bait set (Breinholt et al., 2021). Reference sequences of the 353 genes were downloaded from the Kew Tree of Life Explorer (https://treeoflife.kew.org/); for each genus we sequenced, we attempted to download references from a congeneric species when available, but resorted to closely related genera (e.g., within the family) otherwise (Appendix 1). For Cibotium, assembled sequences of the GoFlag 451 loci from the closest relative in the Cyatheales (Thyrsopteris elegans Kunze) were used as a reference; due to low read mapping rates, no nuclear genes were recovered for Cibotium and it was not further analyzed in this context. HybPiper was run with default settings but with coverage for SPAdes lowered to 4 (--cov_cutoff). The total number of assembled contigs across sample type, age of specimen, and relative to sequencing output was evaluated across libraries ( Figure 1). Analysis of the effects of storage method, collection year, and number of sequencing reads on the number of reference nuclear genes recovered was performed using ANOVA in R version 4.2.1 (R Core Team, 2019). Additionally, we sought to determine whether the genome sizes of these species may have affected the proportion of single-copy nuclear genes recovered. To address this, we obtained genome size estimates from the Kew Plant C-value Database (https://cvalues.science.kew.org/). Depending on the data available, we either obtained the average genome size of the relevant genus, or the relevant family if no plants had been sampled within the genus (Appendix S2). Herbarium and HPDL samples were analyzed separately to determine whether the proportion of single-copy nuclear genes recovered was correlated to the average genome size of the most relevant clade using a Pearson correlation coefficient in R version 4.2.1 (R Core Team, 2019).
Herbarium specimens, much like ancient DNA, are subject to DNA degradation most commonly via DNA deamination (Staats et al., 2011;Weiß et al., 2016). To determine the extent to which storage in herbaria affects the DNA within plant cells, we compared herbarium libraries and their freezer-stored DNA counterparts via read mapping to the chloroplast genome assemblies. For each species, reads were mapped from both herbarium and HPDL samples to the more complete HPDLassembled chloroplast using Bowtie2 (Langmead and Salzberg, 2012). Mapped reads were sorted using SAMtools (Li and Durbin, 2009), and polymorphisms were called and filtered using BCFtools (Danecek et al., 2021). Positions with quality <20 were removed. Heterozygous C-to-T substitutions were inferred to be areas of putative deamination, as it is unlikely for all homologous nucleotides to undergo the deamination process. The number of C-to-T heterozygous sites was determined for both the herbarium tissues and freezerstored DNA samples and was used to calculate a proportion of putative deaminated sites relative to the number of all heterozygous loci. The proportion of putative deamination polymorphisms was compared between paired (herbarium/

Sample and library quality
Many herbarium tissue extractions had higher initial DNA concentrations than their cold storage DNA bank counterparts. However, as we normalized input DNA amounts into library construction, the resulting DNA concentrations of the sequencing libraries were equivalent (Appendix S1).

Chloroplast genome assembly
Herbarium tissue library samples had significantly smaller insert sizes of mapped chloroplast reads compared to their freezer-stored DNA paired samples, taking into account covariates of read numbers and year (F 2,24 = 112.864, P < 0.001). There was also a significant interaction effect between library size and sampling year (F 1,24 = 7.392, P < 0.05). Similarly, herbarium tissue samples also had higher amounts of adapter sequences in the reads (F 2,24 = 43.229, P < 0.001), with sampling year a significant covariate in the model (F 1,24 = 4.776, P < 0.05). The ability to assemble a complete circular chloroplast genome was affected by herbarium tissue storage in a genus-specific manner. For example, in Acacia koa the herbarium tissue library was assembled into 26 scaffolds compared to the frozen DNA stored library that assembled into a single linear fragment. Furthermore, the assembly length of the herbarium tissue sample for A. koa was more than 54,000 bp shorter and had reduced mapping coverage (Table 1). Similarly, the Deschampsia nubigena Hillebr. herbarium tissue sample produced an assembly of 11 scaffolds compared to the frozen DNA stored library that assembled into a circular molecule (Table 1). Interestingly, despite the disparity in D. nubigena assemblies, the lengths were similar, with the herbarium tissue sample actually being longer despite having reduced mapping coverage. In contrast to Acacia and Deschampsia, many other genera exhibited remarkable consistency between herbarium tissue and frozen DNA stored samples in terms of assembly completeness and length. For example, comparisons between storage types yielded identically sized circular genomes in Erythrina sandwicensis O. Deg., Santalum ellipticum Gaudich. (from 1995), and Sesbania tomentosa Hook. & Arn., and nearly identical lengths in Carex wahuensis, Clermontia fauriei H. Lév., Clermontia kakeana Meyen, and S. ellipticum (from 2007) ( Table 1).

Nuclear recovery and DNA deamination
The largest predictors of proportion of the 353 reference genes recovered in our sequencing libraries were the sheer number of paired reads that went into the HybPiper analysis T A B L E 1 Source information for DNA samples, chloroplast assembly, and sequencing library quality metrics. Full accession information is available in Appendix 1. Read pairs surviving Trimmomatic. e The percent of reads removed by FastQC due to adapter readthrough. f The proportion of the 353 reference genes recovered; reference used per species is available in Appendix 1. g A single scaffold was at times recovered, but was not circular. h Cibotium was excluded from nuclear analysis because the study focused on Angiosperms353 gene sets.

EVALUATION OF LONG-TERM STORAGE FOR PLANT DNAS
| 5 of 10 (F 1,24 = 23.451, P < 0.001), the collection year of the starting material (F 1,24 = 8.494, P < 0.01), and the interaction of both year and read number (F 1,24 = 10.563, P < 0.01). Whether a sample was from herbarium tissue or HPDL was not significant in predicting nuclear gene recovery ( Figure 1A). Furthermore, there was no correlation between the difference in number of genes recovered between paired HPDL and herbarium tissue samples across time ( Figure 1B). Finally, chloroplast coverage and the proportion of nuclear genes recovered were correlated (P = 0.008), but both are clearly influenced by the total number of reads ( Figure 1C). The proportion of single-copy nuclear genes exhibited a negative, but non-significant, correlation with genome size for both the herbarium (r = −0.26, P = 0.58) and HPDL samples (r = −0.37, P = 0.29).
Chloroplast polymorphisms were variable between herbarium tissue and frozen DNA stored samples. The Acacia koa herbarium tissue library did not have a significantly higher proportion of putative deaminating polymorphisms than the frozen DNA stored library (χ 2 = 0.322, P > 0.05, df = 1). Carex wahuensis also showed no significant difference in deamination between storage methods (χ 2 = 1.941, P > 0.05, df = 1). For almost all other paired comparisons, the frozen DNA stored library had too few heterozygous locations to establish baseline proportions for a χ 2 analysis. For example, in Deschampsia nubigena the herbarium tissue sample had 36 putative deamination sites out of 139 heterozygous sites total, in contrast to just one out of four sites in the frozen DNA stored library (Appendix S3). In two separate herbarium tissue/frozen DNA paired comparisons within the same species (Santalum ellipticum), both frozen DNA samples had zero heterozygous sites while the paired herbarium tissue samples had three and six putative deamination sites, respectively. In only one comparison (Sesbania tomentosa) did the frozen DNA stored sample have more heterozygous sites than the herbarium tissue sample (26 vs. 1), although none of the 26 represented putative deamination C-to-T substitutions (Appendix S3). Although sample sizes were low, no relationship was found for herbarium tissue samples between the number or proportion of putative deaminated sites with age (all tests P > 0.05), a pattern that was also found in the frozen DNA stored libraries (all tests P > 0.05).

New genetic resources in Hawaiian taxa
The Hawaiian flora is highly endemic and threatened; of the estimated 1400 native plant species, 90% are endemic and nearly a quarter are listed as threatened or endangered (Department of Land and Natural Resources, 2023;Pacific Islands Fish and Wildlife Office, 2023). The chloroplast genomes assembled here represent the first significant genomic resource for many of these taxa. For a few taxa, an appreciable number of nuclear genes were also recovered ( Table 1). We hope these new data can be the starting point for future projects in island biogeography, evolution, and rare plant conservation.

Effect of herbarium storage on plastid and nuclear genome recovery
Sequencing libraries generated from herbarium tissue had high proportions of adapter readthrough during Illumina sequencing, likely as a result of more highly degraded DNA. Despite this noticeable difference, many chloroplast genomes generated from herbarium tissue were recovered in full circular molecules, with a complete circular plastid genome being assembled from a herbarium tissue sample that was 27 years old. However, in some paired comparisons, the herbarium tissue sample assembly was incomplete relative to the assembly from the frozen DNA stored sample. In Acacia koa and Deschampsia nubigena, the sequencing libraries generated from herbarium tissue samples had chloroplast assemblies made up of 26 and 11 scaffolds, respectively, compared to the frozen DNA stored samples, for which chloroplast assemblies were in a single molecule (linear in the case of A. koa, circular in the case of D. nubigena). It is likely that there is a species-specific effect on DNA fragmentation, in addition to the effects of storage method and age. It is already established that inhibitory secondary compounds are affecting herbarium-based DNA extractions in particular species (Marinček et al., 2022). Furthermore, there could be an environmental factor, in that the original collections were made under varying conditions and may have been processed in slightly different ways for herbarium accessioning. Both the herbarium tissue samples and the HPDL samples displayed a non-significant trend in which species with smaller genomes had a higher proportion of nuclear single-copy genes recovered. Unfortunately, the relatively small size of our data set, paired with limited sampling in public databases, precluded us from exploring this relationship further in this work. The effects of genome size and/or ploidy on herbarium genomic studies should be an important consideration for future studies, as it may partially contribute to why certain taxa routinely yield less DNA after being stored on a herbarium sheet.

Effect of herbarium storage on deamination
DNA from herbarium tissue specimens has been noted to degrade, either due to the preservation process or slowly over time (Staats et al., 2011). Much of this degradation appears to happen in plastid genomes and results in the conversion of C bases to T bases as a result of cytosine deamination. There was no clear pattern of increasing deamination in herbarium tissue specimens compared to their frozen DNA counterparts, nor was there a clear pattern across time. In the two species that we could statistically assess (Acacia koa from 1998 and Carex wahuensis from 2019), neither had a significantly higher number of putative deamination sites in the herbarium tissue sample compared to the frozen DNA sample. This could be a function of the short amount of time since the collection of these specimens, resulting in minimal accumulation of spontaneous mutations in the herbarium tissue samples. DNA extracted from older herbarium tissues (1995) of Santalum ellipticum had a larger number (four vs. two) of putative deamination sites compared to a younger specimen (2007); however, the proportions of putative deamination sites were similar between these samples. Notably, no evidence of deamination was seen in either of the HPDL samples of these S. ellipticum specimens (1995 and 2007). Additionally, we can deduce that either lineage-specific or preparation-specific effects are likely at play as no evidence of deamination was seen in two herbarium tissue specimens of Clermontia from 2005. Previous work found that more than 20% of mutated sites were putatively due to deamination, representing a sizable fraction of polymorphism arising from stochastic rather than evolutionary processes (Staats et al., 2011). Here, the levels of polymorphism introduced by deamination were much lower, but also non-zero, and could feasibly affect phylogenetic or population genetic inference of slowly evolving plastid genomes.

Conclusions
Chloroplast assemblies and, to a lesser extent, nuclear genes, can be recovered from both herbarium-stored tissues and frozen DNA samples. In general, the frozen DNA stored libraries were of higher quality (e.g., less fragmented, higher coverage), but age and sequencing effort had significant effects on the resulting chloroplast assembly. Both herbaria-stored tissues and frozen DNA samples face potential pitfalls in terms of storage; the former is susceptible to pests and environmental conditions, while the latter can undergo catastrophic equipment failures. Using both storage methodsparticularly for rare and threatened floras-can represent an insurance policy for future studies. Moreover, while herbarium tissue-derived DNA has proven time and again to be useful in exploring a number of research questions, for studies requiring larger fragments from more intact DNA, freezer-stored DNA libraries will be invaluable.
AUTHOR CONTRIBUTIONS E.V.M. conceived of the study, analyzed the data, and wrote the manuscript; C.D. assisted with laboratory work including DNA extractions; M.Y. and C.M. provided HPDL samples and feedback on the manuscript; K.H. assisted with laboratory work and data analysis and helped to write the final manuscript. All authors approved the final version of the manuscript.

SUPPORTING INFORMATION
Additional supporting information can be found online in the Supporting Information section at the end of this article.
Appendix S1. Information on the weight of tissue used for DNA and all pre-and post-library quantification.